Anonymization Methods for Taxonomic Microdata
نویسندگان
چکیده
Often microdata sets contain attributes which are neither numerical nor ordinal, but take nominal values from a taxonomy, ontology or classification (e.g. diagnosis in a medical data set about patients, economic activity in an economic data set, etc.). Such data sets must be anonymized if transferred outside the data collector’s premises (e.g. hospital or national statistical office), say, for research purposes. The literature on microdata anonymization methods is relatively limited for nominal data. Multiple imputation is a usual choice for such data, but it has computational problems when nominal attributes can take many possible different values. In this paper, we provide anonymization methods for data sets which include nominal taxonomic attributes with many possible different values. We show how to adapt to the case of taxonomic attributes two anonymization methods, data shuffling and microaggregation, that were originally designed for numerical attributes. The above adaptation relies on a hierarchy-aware numerical mapping of nominal categories, which we call marginality. The resulting adapted methods circumvent the computational problems of multiple imputation and take the semantics of the taxonomy into account.
منابع مشابه
An Algorithm for k-Anonymity-Based Fingerprinting
The anonymization of sensitive microdata (e.g. medical health records) is a widely-studied topic in the research community. A still unsolved problem is the limited informative value of anonymized microdata that often rules out further processing (e.g. statistical analysis). Thus, a tradeoff between anonymity and data precision has to be made, resulting in the release of partially anonymized mic...
متن کاملOn the Identification of Property Based Generalizations in Microdata Anonymization
Majority of the search algorithms in microdata anonymization restrict themselves to a single privacy property and a single criteria to optimize. The solutions obtained are therefore of limited application since adherence to multiple privacy models is required to impede different forms of privacy attacks. Towards this end, we propose the concept of a property based generalization (PBG) to captur...
متن کاملDistribution-based Microdata Anonymization
Before sharing to support ad hoc aggregate analyses, microdata often need to be anonymized to protect the privacy of individuals. A variety of privacy models have been proposed for microdata anonymization. Many of these models (e.g., -closeness) essentially require that, after anonymization, groups of sensitive attribute values follow specified distributions. To support such models, in this pap...
متن کاملAnonymization of statistical data
In the modern digital society, personal information about individuals can be collected, stored, shared, and disseminated much more easily and freely. Such data can be released in macrodata form, reporting aggregated information, or in microdata form, reporting specific information on individual respondent. Protecting data against improper disclosure is then becoming critical to ensure proper pr...
متن کامل